2022
Journal article
Open Access
Distilled neural networks for efficient learning to rank
Nardini F. M., Rulli C., Trani S., Venturini R.Recent studies in Learning to Rank have shown the possibility to effectively distill a neural network from an ensemble of regression trees. This result leads neural networks to become a natural competitor of tree-based ensembles on the ranking task. Nevertheless, ensembles of regression trees outperform neural models both in terms of efficiency and effectiveness, particularly when scoring on CPU. In this paper, we propose an approach for speeding up neural scoring time by applying a combination of Distillation, Pruning and Fast Matrix multiplication. We employ knowledge distillation to learn shallow neural networks from an ensemble of regression trees. Then, we exploit an efficiency-oriented pruning technique that performs a sparsification of the most computationally-intensive layers of the neural network that is then scored with optimized sparse matrix multiplication. Moreover, by studying both dense and sparse high performance matrix multiplication, we develop a scoring time prediction model which helps in devising neural network architectures that match the desired efficiency requirements. Comprehensive experiments on two public learning-to-rank datasets show that neural networks produced with our novel approach are competitive at any point of the effectiveness-efficiency trade-off when compared with tree-based ensembles, providing up to 4x scoring time speed-up without affecting the ranking quality.Source: IEEE transactions on knowledge and data engineering (Online) 35 (2022): 4695–4712. doi:10.1109/TKDE.2022.3152585
DOI: 10.1109/tkde.2022.3152585Metrics:
See at:
ISTI Repository | ieeexplore.ieee.org | CNR ExploRA
2021
Conference article
Open Access
TSXor: a simple time series compression algorithm
Bruno A., Nardini F. M., Pibiri G. E., Trani R., Venturini R.Time series are ubiquitous in computing as a key ingredient of many machine learning analytics, ranging from classification to forecasting. Typically, the training of such machine learning algorithms on time series requires to access the data in temporal order for several times. Therefore, a compression algorithm providing good compression ratios and fast decompression speed is desirable. In this paper, we present TSXor, a simple yet effective lossless compressor for time series. The main idea is to exploit the redundancy/similarity between close-in-time values through a window that acts as a cache, as to improve the compression ratio and decompression speed. We show that TSXor achieves up to 3× better compression and up to 2× faster decompression than the state of the art on real-world datasets.Source: SPIRE 2021 - International Symposium on String Processing and Information Retrieval, pp. 217–223, France, Lille (Virtual Event), 04/10/2021-06/10/2021
DOI: 10.1007/978-3-030-86692-1_18Metrics:
See at:
ISTI Repository | link.springer.com | CNR ExploRA
2021
Journal article
Open Access
Fast filtering of search results sorted by attribute
Nardini F. M., Trani R., Venturini R.Modern search services often provide multiple options to rank the search results, e.g., sort "by relevance", "by price" or "by discount" in e-commerce. While the traditional rank by relevance effectively places the relevant results in the top positions of the results list, the rank by attribute could place many marginally relevant results in the head of the results list leading to poor user experience. In the past, this issue has been addressed by investigating the relevance-aware filtering problem, which asks to select the subset of results maximizing the relevance of the attribute-sorted list. Recently, an exact algorithm has been proposed to solve this problem optimally. However, the high computational cost of the algorithm makes it impractical for the Web search scenario, which is characterized by huge lists of results and strict time constraints. For this reason, the problem is often solved using efficient yet inaccurate heuristic algorithms. In this paper, we first prove the performance bounds of the existing heuristics. We then propose two efficient and effective algorithms to solve the relevance-aware filtering problem. First, we propose OPT-Filtering, a novel exact algorithm that is faster than the existing state-of-the-art optimal algorithm. Second, we propose an approximate and even more efficient algorithm, ?-Filtering, which, given an allowed approximation error ?, finds a (1-?)-optimal filtering, i.e., the relevance of its solution is at least (1-?) times the optimum. We conduct a comprehensive evaluation of the two proposed algorithms against state-of-the-art competitors on two real-world public datasets. Experimental results show that OPT-Filtering achieves a significant speedup of up to two orders of magnitude with respect to the existing optimal solution, while ?-Filtering further improves this result by trading effectiveness for efficiency. In particular, experiments show that ?-Filtering can achieve quasi-optimal
solutions while being faster than all state-of-the-art competitors in most of the tested configurations.Source: ACM transactions on information systems 40 (2021). doi:10.1145/3477982
DOI: 10.1145/3477982Metrics:
See at:
ISTI Repository | dl.acm.org | CNR ExploRA
2019
Journal article
Open Access
Parallel Traversal of Large Ensembles of Decision Trees
Lettich F., Lucchese C., Nardini F. M., Orlando S., Perego R., Tonellotto N., Venturini R.Machine-learnt models based on additive ensembles of regression trees are currently deemed the best solution to address complex classification, regression, and ranking tasks. The deployment of such models is computationally demanding: to compute the final prediction, the whole ensemble must be traversed by accumulating the contributions of all its trees. In particular, traversal cost impacts applications where the number of candidate items is large, the time budget available to apply the learnt model to them is limited, and the users' expectations in terms of quality-of-service is high. Document ranking in web search, where sub-optimal ranking models are deployed to find a proper trade-off between efficiency and effectiveness of query answering, is probably the most typical example of this challenging issue. This paper investigates multi/many-core parallelization strategies for speeding up the traversal of large ensembles of regression trees thus obtaining machine-learnt models that are, at the same time, effective, fast, and scalable. Our best results are obtained by the GPU-based parallelization of the state-of-the-art algorithm, with speedups of up to 102.6x.Source: IEEE transactions on parallel and distributed systems (Print) 30 (2019): 2075–2089. doi:10.1109/TPDS.2018.2860982
DOI: 10.1109/tpds.2018.2860982DOI: 10.5281/zenodo.2668378DOI: 10.5281/zenodo.2668379Project(s): BigDataGrapes Metrics:
See at:
IEEE Transactions on Parallel and Distributed Systems | ZENODO | ZENODO | Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari | ISTI Repository | IEEE Transactions on Parallel and Distributed Systems | ieeexplore.ieee.org | CNR ExploRA
2019
Journal article
Open Access
Handling massive n-gram datasets efficiently
Pibiri G. E., Venturini R.Two fundamental problems concern the handling of large n-gram language models: indexing, that is, compressing the n-grams and associated satellite values without compromising their retrieval speed, and estimation, that is, computing the probability distribution of the n-grams extracted from a large textual source. Performing these two tasks efficiently is vital for several applications in the fields of Information Retrieval, Natural Language Processing, and Machine Learning, such as auto-completion in search engines and machine translation. Regarding the problem of indexing, we describe compressed, exact, and lossless data structures that simultaneously achieve high space reductions and no time degradation with respect to the state-of-the-art solutions and related software packages. In particular, we present a compressed trie data structure in which each word of an n-gram following a context of fixed length k, that is, its preceding k words, is encoded as an integer whose value is proportional to the number of words that follow such context. Since the number of words following a given context is typically very small in natural languages, we lower the space of representation to compression levels that were never achieved before, allowing the indexing of billions of strings. Despite the significant savings in space, our technique introduces a negligible penalty at query time. Specifically, the most space-efficient competitors in the literature, which are both quantized and lossy, do not take less than our trie data structure and are up to 5 times slower. Conversely, our trie is as fast as the fastest competitor but also retains an advantage of up to 65% in absolute space. Regarding the problem of estimation, we present a novel algorithm for estimating modified Kneser-Ney language models that have emerged as the de-facto choice for language modeling in both academia and industry thanks to their relatively low perplexity performance. Estimating such models from large textual sources poses the challenge of devising algorithms that make a parsimonious use of the disk. The state-of-the-art algorithm uses three sorting steps in external memory: we show an improved construction that requires only one sorting step by exploiting the properties of the extracted n-gram strings. With an extensive experimental analysis performed on billions of n-grams, we show an average improvement of 4.5 times on the total runtime of the previous approach.Source: ACM transactions on information systems 37 (2019). doi:10.1145/3302913
DOI: 10.1145/3302913DOI: 10.48550/arxiv.1806.09447DOI: 10.5281/zenodo.3257994DOI: 10.5281/zenodo.3257995Project(s): BigDataGrapes Metrics:
See at:
arXiv.org e-Print Archive | ZENODO | ZENODO | ISTI Repository | ACM Transactions on Information Systems | dl.acm.org | ACM Transactions on Information Systems | doi.org | CNR ExploRA
2019
Conference article
Open Access
Fast Approximate Filtering of Search Results Sorted by Attribute
Nardini F. M., Trani R., Venturini R.Several Web search services enable their users with the possibility of sorting the list of results by a specific attribute, e.g., sort "by price" in e-commerce. However, sorting the results by attribute could bring marginally relevant results in the top positions thus leading to a poor user experience. This motivates the definition of the relevance-aware filtering problem. This problem asks to remove results from the attribute-sorted list to maximize its final overall relevance. Recently, an optimal solution to this problem has been proposed. However, it has strong limitations in the Web scenario due to its high computational cost. In this paper, we propose ?-Filtering: an efficient approximate algorithm with strong approximation guarantees on the relevance of the final list. More precisely, given an allowed approximation error ?, the proposed algorithm finds a(1-?)"optimal filtering, i.e., the relevance of its solution is at least (1-?) times the optimum. We conduct a comprehensive evaluation of ?-Filtering against state-of-the-art competitors on two real-world public datasets. Experiments show that ?-Filtering achieves the desired levels of effectiveness with a speedup of up to two orders of magnitude with respect to the optimal solution while guaranteeing very small approximation errors.Source: 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 815–824, Parigi, Francia, 21/07/2019, 25/07/2019
DOI: 10.1145/3331184.3331227Project(s): BigDataGrapes Metrics:
See at:
ISTI Repository | dl.acm.org | doi.org | CNR ExploRA
2018
Conference article
Open Access
Efficient and effective query expansion for web search
Lucchese C., Nardini F. M., Perego R., Trani R., Venturini R.Query Expansion (QE) techniques expand the user queries with additional terms, e.g., synonyms and acronyms, to enhance the system recall. State-of-the-art solutions employ machine learning methods to select the most suitable terms. However, most of them neglect the cost of processing the expanded queries, thus selecting effective, yet very expensive, terms. The goal of this paper is to enable QE in scenarios with tight time constraints proposing a QE framework based on structured queries and efficiency-aware term selection strategies. In particular, the proposed expansion selection strategies aim at capturing the efficiency and the effectiveness of the expansion candidates, as well as the dependencies among them. We evaluate our proposals by conducting an extensive experimental assessment on real-world search engine data and public TREC data. Results confirm that our approach leads to a remarkable efficiency improvement w.r.t. the state-of-the-art: a reduction of the retrieval time up to 30 times, with only a small loss of effectiveness.Source: International Conference on Information and Knowledge Management (CIKM), pp. 1551–1554, 22-26/10/2018
DOI: 10.1145/3269206.3269305DOI: 10.5281/zenodo.2668248DOI: 10.5281/zenodo.2668249Project(s): BigDataGrapes Metrics:
See at:
ZENODO | ZENODO | ISTI Repository | zenodo.org | zenodo.org | dl.acm.org | doi.org | CNR ExploRA
2017
Conference article
Restricted
Multicore/Manycore parallel traversal of large forests of regression trees
Lettich F., Lucchese C., Nardini F. M., Orlando S., Perego R., Tonellotto N., Venturini R.Machine-learnt models based on additive ensembles of binary regression trees are currently considered one of the best solutions to address complex classification, regression, and ranking tasks. To evaluate these complex models over a continuous stream of data items with high throughput requirements, we need to optimize, and possibly parallelize, the traversal of thousands of trees, each including hundreds of nodes.Document ranking in Web Search is a typical example of this challenging scenario, where complex tree-based models are used to score query-document pairs and finally rank lists of document results for each incoming query (a.k.a. Learning-to-Rank). In this extended abstract, we briefly discuss some preliminary results concerning the parallelization strategies for QUICKSCORER - indeed the state-of-art scoring algorithm that exploits ensembles of decision trees - by using multicore CPUs (with SIMD coprocessors) and manycore GPUs. We show that QUICKSCORER, which transforms the traversal of thousands of decision trees in a linear access to array data structures, can be parallelized very effectively, by achieving very interesting speedups.Source: HPCS 2017 - International Conference on High Performance Computing & Simulation, pp. 915–915, Genoa, Italy, 17-21 July 2017
DOI: 10.1109/hpcs.2017.154Metrics:
See at:
doi.org | ieeexplore.ieee.org | CNR ExploRA
2017
Conference article
Open Access
QuickScorer: efficient traversal of large ensembles of decision trees
Lucchese C., Nardini F. M., Orlando S., Perego R., Tonellotto N., Venturini R.Machine-learnt models based on additive ensembles of binary regression trees are currently deemed the best solution to address complex classification, regression, and ranking tasks. Evaluating these models is a computationally demanding task as it needs to traverse thousands of trees with hundreds of nodes each. The cost of traversing such large forests of trees significantly impacts their application to big and stream input data, when the time budget available for each prediction is limited to guarantee a given processing throughput. Document ranking in Web search is a typical example of this challenging scenario, where the exploitation of tree-based models to score query-document pairs, and finally rank lists of documents for each incoming query, is the state-of-art method for ranking (a.k.a. Learning-to-Rank). This paper presents QuickScorer, a novel algorithm for the traversal of huge decision trees ensembles that, thanks to a cache- and CPU-aware design, provides a 9 speedup over best competitors.Source: ECML PKDD - Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 383–387, Skopje, Macedonia, 18-22 September, 2017
DOI: 10.1007/978-3-319-71273-4_36Metrics:
See at:
arpi.unipi.it | doi.org | link.springer.com | CNR ExploRA
2017
Conference article
Open Access
Dynamic Elias-Fano Representation
Pibiri G. E., Venturini R.We show that it is possible to store a dynamic ordered set S(n,u) of n integers drawn from a bounded universe of size u in space close to the information-theoretic lower bound and yet preserve the asymptotic time optimality of the operations. Our results leverage on the Elias-Fano representation of S(n,u) which takes EF(S(n,u))=n?log(u/n)?+2n bits of space and can be shown to be less than half a bit per element away from the information-theoretic minimum.
Considering a RAM model with memory words of ?(log u) bits, we focus on the case in which the integers of S are drawn from a polynomial universe of size u=n?, for any ?=?(1). We represent S(n,u) with EF(S(n,u))+o(n) bits of space and:
1. support static predecessor/successor queries in O(min{1+log(u/n),loglog n});
2. make S grow in an append-only fashion by spending O(1) per inserted element;
3. support random access in O(log n/loglog n) worst-case, insertions/deletions in O(log n/loglog n) amortized and predecessor/successor queries in O(min{1+log(u/n),loglog n}) worst-case time. These time bounds are optimal.Source: Annual Symposium on Combinatorial Pattern Matching, Varsavia, Polonia, 4-6/07/2017
DOI: 10.4230/lipics.cpm.2017.30Metrics:
See at:
drops.dagstuhl.de | ISTI Repository | CNR ExploRA
2017
Conference article
Restricted
Efficient Data Structures for Massive N-Gram Datasets
Pibiri G. E., Venturini R.The effcient indexing of large and sparse N-gram datasets is crucial in several applications in Information Retrieval, Natural Language Processing and Machine Learning. Because of the stringent efficiency requirements, dealing with billions of N-grams poses the challenge of introducing a compressed representation that preserves the query processing speed. In this paperwe study the problem of reducing the space required by the representation of such datasets, maintaining the capability of looking up for a given N-gram within micro seconds. For this purpose we describe compressed, exact and lossless data structures that achieve, at the same time, high space reductions and no time degradation with respect to state-of-The-Art software packages. In particular, we present a trie data structure in which each word following a context of fixed length k, i.e., its preceding k words, is encoded as an integer whose value is proportional to the number of words that follow such context. Since the number of words following a given context is typically very small in natural languages, we are able to lower the space of representation to compression levels that were never achieved before. Despite the significant savings in space, we show that our technique introduces a negligible penalty at query time.Source: International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 615–624, Tokyo, Giappone, 7-11/08/2017
DOI: 10.1145/3077136.3080798Metrics:
See at:
dl.acm.org | doi.org | CNR ExploRA
2017
Conference article
Open Access
An encoding for order-preserving matching
Gagie T., Manzini G., Venturini R.Encoding data structures store enough information to answer the queries they are meant to support but not enough to recover their underlying datasets. In this paper we give the first encoding data structure for the challenging problem of order-preserving pattern matching. This problem was introduced only a few years ago but has already attracted significant attention because of its applications in data analysis. Two strings are said to be an order-preserving match if the relative order of their characters is the same: E.g., 4, 1, 3, 2 and 10, 3, 7, 5 are an order preserving match. We show how, given a string S[1..n] over an arbitrary alphabet of size ? and a constant c >= 1, we can build an O(n log log n)-bit encoding such that later, given a pattern P[1..m] with m <= logcn, we can return the number of order-preserving occurrences of P in S in O(m) time. Within the same time bound we can also return the starting position of some order preserving match for P in S (if such a match exists). We prove that our space bound is within a constant factor of optimal if log?=? (log log n); our query time is optimal if log?= (log n). Our space bound contrasts with the ? (n log n) bits needed in the worst case to store S itself, an index for order-preserving pattern matching with no restrictions on the pattern length, or an index for standard pattern matching even with restrictions on the pattern length. Moreover, we can build our encoding knowing only how each character compares to O(logc n) neighbouring characters.Source: 25th European Symposium on Algorithms, ESA 2017, pp. 38:1–38:15, Vienna, Austria, 4-6 September, 2016
DOI: 10.4230/lipics.esa.2017.38DOI: 10.48550/arxiv.1610.02865Metrics:
See at:
arXiv.org e-Print Archive | drops.dagstuhl.de | ISTI Repository | doi.org | doi.org | CNR ExploRA
2016
Journal article
Restricted
Space-Efficient Substring Occurrence Estimation
Orlandi A., Venturini R.In this paper we study the problem of estimating the number of occurrences of substrings in textual data: A text on some alphabet of length is preprocessed and an index is built. The index is used in lieu of the text to answer queries of the form , returning an approximated number of the occurrences of an arbitrary pattern as a substring of . The problem has its main application in selectivity estimation related to the LIKE predicate in textual databases. Our focus is on obtaining an algorithmic solution with guaranteed error rates and small footprint. To achieve that, we first enrich previous work in the area of compressed text-indexing providing an optimal data structure that, for a given additive error , requires bits. We also approach the issue of guaranteeing exact answers for sufficiently frequent patterns, providing a data structure whose size scales with the amount of such patterns. Our theoretical findings are supported by experiments showing the practical impact of our data structures.Source: Algorithmica 74 (2016): 65–90. doi:10.1007/s00453-014-9936-y
DOI: 10.1007/s00453-014-9936-yDOI: 10.1145/1989284.1989300Metrics:
See at:
Algorithmica | doi.org | link.springer.com | CNR ExploRA
2016
Journal article
Restricted
Compressed Cache-Oblivious String B-Tree
Ferragina P., Venturini R.In this article, we study three variants of the well-known prefix-search problem for strings, and we design solutions for the cache-oblivious model which improve the best known results. Among these contributions, we close (asymptotically) the classic problem, which asks for the detection of the set of strings that share the longest common prefix with a queried pattern by providing an I/O-optimal solution that matches the space lower bound for tries up to a constant multiplicative factor of the form (1 + epsilon), for epsilon > 0. Our solutions hinge upon a novel compressed storage scheme that adds the ability to decompress prefixes of the stored strings I/O-optimally to the elegant locality-preserving front coding (Bender et al. 2006) still preserving its space bounds.Source: ACM transactions on algorithms 12 (2016). doi:10.1145/2903141
DOI: 10.1145/2903141DOI: 10.1007/978-3-642-40450-4_40Project(s): SoBigData Metrics:
See at:
dl.acm.org | doi.org | ACM Transactions on Algorithms | CNR ExploRA